{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5.5.2 Basics of MPI-parallel NGSolve\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "num_procs = '20'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from usrmeeting_jupyterstuff import *"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stop_cluster()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Waiting for connection file: ~/.ipython/profile_ngsolve/security/ipcontroller-kogler-client.json\n",
      "connecting ... try:6 succeeded!"
     ]
    }
   ],
   "source": [
    "start_cluster(num_procs)\n",
    "connect_cluster()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using MPI through NGSolve\n",
    "For convenience, NGSolve provides access to a little bit of MPI-functionality through the python interface.\n",
    "\n",
    "However, the general philosophy is that NGSolve should handle MPI-stuff on C++ side and not require the user\n",
    "to directly use it.\n",
    "\n",
    "In cases where more MPI functionality is needed, mpi4py can be used.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting Started\n",
    "MPI_Init returns a wrapper around the MPI-communcator used in NGSolve.\n",
    "\n",
    "It provides some basic functionality, for example it can tell us the number of\n",
    "procs in it, and give us the rank of each one:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[stdout:0] Hello from rank  13  of  20\n",
      "[stdout:1] Hello from rank  2  of  20\n",
      "[stdout:2] Hello from rank  15  of  20\n",
      "[stdout:3] Hello from rank  7  of  20\n",
      "[stdout:4] Hello from rank  14  of  20\n",
      "[stdout:5] Hello from rank  6  of  20\n",
      "[stdout:6] Hello from rank  12  of  20\n",
      "[stdout:7] Hello from rank  19  of  20\n",
      "[stdout:8] Hello from rank  18  of  20\n",
      "[stdout:9] Hello from rank  3  of  20\n",
      "[stdout:10] Hello from rank  5  of  20\n",
      "[stdout:11] Hello from rank  17  of  20\n",
      "[stdout:12] Hello from rank  8  of  20\n",
      "[stdout:13] Hello from rank  16  of  20\n",
      "[stdout:14] Hello from rank  10  of  20\n",
      "[stdout:15] Hello from rank  4  of  20\n",
      "[stdout:16] Hello from rank  0  of  20\n",
      "[stdout:17] Hello from rank  11  of  20\n",
      "[stdout:18] Hello from rank  9  of  20\n",
      "[stdout:19] Hello from rank  1  of  20\n"
     ]
    }
   ],
   "source": [
    "%%px\n",
    "from ngsolve import *\n",
    "comm = MPI_Init()\n",
    "print(\"Hello from rank \", comm.rank, ' of ', comm.size)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Additionally, \"comm\" provides:\n",
    "\n",
    "- time measurement\n",
    "- barriers\n",
    "- computing sums, minima, maxima"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[stdout:16] There are  20  of us, which took us  0.017204 seconds to figure out\n"
     ]
    }
   ],
   "source": [
    "%%px\n",
    "t = comm.WTime()\n",
    "s2 = comm.Sum(1)\n",
    "t = comm.Max(comm.WTime()-t)\n",
    "if comm.rank==0:\n",
    "    print('There are ', s2, ' of us, which took us ', round(t,6), 'seconds to figure out')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Distributed Meshes\n",
    "When we load a mesh from a file in parallel, it gets distributed among the ranks and each one gets only a part of it, \n",
    "**rank 0 gets nothing**.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[stdout:0] rank 13's part of the mesh has  4 elements,  4 faces,  9 edges and  6  vertices\n",
      "[stdout:1] rank 2's part of the mesh has  2 elements,  2 faces,  6 edges and  5  vertices\n",
      "[stdout:2] rank 15's part of the mesh has  3 elements,  3 faces,  8 edges and  6  vertices\n",
      "[stdout:3] rank 7's part of the mesh has  2 elements,  2 faces,  7 edges and  6  vertices\n",
      "[stdout:4] rank 14's part of the mesh has  3 elements,  3 faces,  7 edges and  5  vertices\n",
      "[stdout:5] rank 6's part of the mesh has  3 elements,  3 faces,  7 edges and  5  vertices\n",
      "[stdout:6] rank 12's part of the mesh has  4 elements,  4 faces,  9 edges and  6  vertices\n",
      "[stdout:7] rank 19's part of the mesh has  3 elements,  3 faces,  10 edges and  7  vertices\n",
      "[stdout:8] rank 18's part of the mesh has  2 elements,  2 faces,  8 edges and  7  vertices\n",
      "[stdout:9] rank 3's part of the mesh has  3 elements,  3 faces,  8 edges and  6  vertices\n",
      "[stdout:10] rank 5's part of the mesh has  3 elements,  3 faces,  10 edges and  7  vertices\n",
      "[stdout:11] rank 17's part of the mesh has  3 elements,  3 faces,  9 edges and  9  vertices\n",
      "[stdout:12] rank 8's part of the mesh has  3 elements,  3 faces,  7 edges and  5  vertices\n",
      "[stdout:13] rank 16's part of the mesh has  4 elements,  4 faces,  9 edges and  6  vertices\n",
      "[stdout:14] rank 10's part of the mesh has  2 elements,  2 faces,  5 edges and  4  vertices\n",
      "[stdout:15] rank 4's part of the mesh has  3 elements,  3 faces,  8 edges and  7  vertices\n",
      "[stdout:16] rank 0's part of the mesh has  0 elements,  0 faces,  0 edges and  0  vertices\n",
      "[stdout:17] rank 11's part of the mesh has  2 elements,  2 faces,  6 edges and  5  vertices\n",
      "[stdout:18] rank 9's part of the mesh has  3 elements,  3 faces,  7 edges and  5  vertices\n",
      "[stdout:19] rank 1's part of the mesh has  4 elements,  4 faces,  10 edges and  7  vertices\n"
     ]
    }
   ],
   "source": [
    "%%px\n",
    "mesh = Mesh('square.vol')\n",
    "print('rank', str(comm.rank)+\"'s part of the mesh has \", mesh.ne, 'elements, ', \\\n",
    "      mesh.nface, 'faces, ', mesh.nedge, 'edges and ', mesh.nv, ' vertices')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![square_apart](square_apart.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, the entire geometry information is present everywhere:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[stdout:0] \n",
      "rank 13 Materials: ('default',)\n",
      "rank 13 Boundaries:  ('bottom', 'right', 'top', 'left')\n",
      "[stdout:1] \n",
      "rank 2 Materials: ('default',)\n",
      "rank 2 Boundaries:  ('bottom', 'right', 'top', 'left')\n",
      "[stdout:2] \n",
      "rank 15 Materials: ('default',)\n",
      "rank 15 Boundaries:  ('bottom', 'right', 'top', 'left')\n",
      "[stdout:3] \n",
      "rank 7 Materials: ('default',)\n",
      "rank 7 Boundaries:  ('bottom', 'right', 'top', 'left')\n",
      "[stdout:4] \n",
      "rank 14 Materials: ('default',)\n",
      "rank 14 Boundaries:  ('bottom', 'right', 'top', 'left')\n"
     ]
    }
   ],
   "source": [
    "%%px --targets 0:5\n",
    "print('rank', comm.rank, 'Materials:', mesh.GetMaterials())\n",
    "print('rank', comm.rank, 'Boundaries: ', mesh.GetBoundaries())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Distributed Finite Element Spaces\n",
    "When we define a Finite Element Space on a distributed mesh, each rank constructs a\n",
    "Finite Element Space on it's part of the mesh."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[stdout:0] fes on rank 13 has 28 DOFs, globally we have  283\n",
      "[stdout:1] fes on rank 2 has 19 DOFs, globally we have  283\n",
      "[stdout:2] fes on rank 15 has 25 DOFs, globally we have  283\n",
      "[stdout:3] fes on rank 7 has 22 DOFs, globally we have  283\n",
      "[stdout:4] fes on rank 14 has 22 DOFs, globally we have  283\n",
      "[stdout:5] fes on rank 6 has 22 DOFs, globally we have  283\n",
      "[stdout:6] fes on rank 12 has 28 DOFs, globally we have  283\n",
      "[stdout:7] fes on rank 19 has 30 DOFs, globally we have  283\n",
      "[stdout:8] fes on rank 18 has 25 DOFs, globally we have  283\n",
      "[stdout:9] fes on rank 3 has 25 DOFs, globally we have  283\n",
      "[stdout:10] fes on rank 5 has 30 DOFs, globally we have  283\n",
      "[stdout:11] fes on rank 17 has 30 DOFs, globally we have  283\n",
      "[stdout:12] fes on rank 8 has 22 DOFs, globally we have  283\n",
      "[stdout:13] fes on rank 16 has 28 DOFs, globally we have  283\n",
      "[stdout:14] fes on rank 10 has 16 DOFs, globally we have  283\n",
      "[stdout:15] fes on rank 4 has 26 DOFs, globally we have  283\n",
      "[stdout:16] fes on rank 0 has 0 DOFs, globally we have  283\n",
      "[stdout:17] fes on rank 11 has 19 DOFs, globally we have  283\n",
      "[stdout:18] fes on rank 9 has 22 DOFs, globally we have  283\n",
      "[stdout:19] fes on rank 1 has 31 DOFs, globally we have  283\n"
     ]
    }
   ],
   "source": [
    "%%px\n",
    "fes = H1(mesh, order=3, dirichlet='bottom|left')\n",
    "print('fes on rank', comm.rank, 'has', fes.ndof, 'DOFs, globally we have ', fes.ndofglobal)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[stdout:16] Strangely, the sum of all local DOFs is  470 != 283\n"
     ]
    }
   ],
   "source": [
    "%%px\n",
    "nd2 = comm.Sum(fes.ndof)\n",
    "if comm.rank==0:\n",
    "    print('Strangely, the sum of all local DOFs is ', nd2, '!=', fes.ndofglobal)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we just sum up the dimensions of the local spaces $V^i$, we get the dimension of \n",
    "$\\Pi_i V^i$ and not the dimension of \n",
    "\n",
    "$$\n",
    "V = \\Pi_i V^i \\cap C^0(\\Omega)\n",
    "$$\n",
    "\n",
    "Some base functions have to be shared across subdomains. Each subdomain takes the place of a makro Finite Element.\n",
    "![bf_apart](bf_apart.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Information about how the DOFs stick together on a global level are stored in \n",
    "the \"ParallelDofs\" object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[stdout:0] rank 13 has 28 local DOFs, globally we have 283\n",
      "[stdout:1] rank 2 has 19 local DOFs, globally we have 283\n",
      "[stdout:2] rank 15 has 25 local DOFs, globally we have 283\n",
      "[stdout:3] rank 7 has 22 local DOFs, globally we have 283\n",
      "[stdout:4] rank 14 has 22 local DOFs, globally we have 283\n",
      "[stdout:5] rank 6 has 22 local DOFs, globally we have 283\n",
      "[stdout:6] rank 12 has 28 local DOFs, globally we have 283\n",
      "[stdout:7] rank 19 has 30 local DOFs, globally we have 283\n",
      "[stdout:8] rank 18 has 25 local DOFs, globally we have 283\n",
      "[stdout:9] rank 3 has 25 local DOFs, globally we have 283\n",
      "[stdout:10] rank 5 has 30 local DOFs, globally we have 283\n",
      "[stdout:11] rank 17 has 30 local DOFs, globally we have 283\n",
      "[stdout:12] rank 8 has 22 local DOFs, globally we have 283\n",
      "[stdout:13] rank 16 has 28 local DOFs, globally we have 283\n",
      "[stdout:14] rank 10 has 16 local DOFs, globally we have 283\n",
      "[stdout:15] rank 4 has 26 local DOFs, globally we have 283\n",
      "[stdout:16] rank 0 has 0 local DOFs, globally we have 283\n",
      "[stdout:17] rank 11 has 19 local DOFs, globally we have 283\n",
      "[stdout:18] rank 9 has 22 local DOFs, globally we have 283\n",
      "[stdout:19] rank 1 has 31 local DOFs, globally we have 283\n"
     ]
    }
   ],
   "source": [
    "%%px\n",
    "pd = fes.ParallelDofs()\n",
    "print('rank', comm.rank, 'has', pd.ndoflocal, 'local DOFs, globally we have', pd.ndofglobal)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can find out which DOFs are shared with which ranks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "I am rank  7\n",
      "---\n",
      "I share DOF 0 with ranks: [8]\n",
      "I share DOF 1 with ranks: [1, 2, 17]\n",
      "I share DOF 2 with ranks: [1, 5]\n",
      "I share DOF 3 with ranks: [5]\n",
      "I share DOF 4 with ranks: [1, 5, 8]\n",
      "I share DOF 5 with ranks: [5, 8]\n",
      "I share DOF 6 with ranks: []\n",
      "I share DOF 7 with ranks: []\n",
      "I share DOF 8 with ranks: [8]\n",
      "I share DOF 9 with ranks: [8]\n",
      "... and so forth ...\n",
      "\n",
      "\n",
      "DOFs I share with rank 1 :  [1, 2, 4, 10, 11]\n",
      "DOFs I share with rank 2 :  [1]\n",
      "DOFs I share with rank 5 :  [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 18, 19]\n",
      "DOFs I share with rank 8 :  [0, 4, 5, 8, 9]\n",
      "DOFs I share with rank 17 :  [1]\n"
     ]
    }
   ],
   "source": [
    "%%px --target=3\n",
    "print('I am rank ', comm.rank)\n",
    "print('---')\n",
    "\n",
    "for k in range(min(10,fes.ndof)):\n",
    "    print('I share DOF', k, 'with ranks:', [p for p in pd.Dof2Proc(k)])\n",
    "    \n",
    "print('... and so forth ...')\n",
    "print('\\n')\n",
    "\n",
    "for p in range(0, comm.size-1):\n",
    "    if len(pd.Proc2Dof(p)):\n",
    "        print('DOFs I share with rank', p, ': ', [p for p in pd.Proc2Dof(p)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are a couple of points to consider here:\n",
    "\n",
    " - Locally, DOFs are numbered 0..ndoflocal-1.\n",
    " - There is no global enumeration!\n",
    " - The local numbering of DOFs is conistent across subdomain boundaries.\n",
    "   (This just means that there exists some global enumeration of DOFs that is cinsistent with the local ones.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Distributed Weak Formulations & Linear Algebra\n",
    "\n",
    "Linear- or Bilinearforms can be split into subdomain contributions.\n",
    "\n",
    "For example, the usual bilinear form $a(\\cdot, \\cdot)$ associated to Poisson's equation can be split into\n",
    "$a_i(\\cdot, \\cdot)$ defined by:\n",
    "$$\n",
    "a(u,v) = \\sum_i a_i(u, v) = \\sum_i \\int_{\\Omega_i} \\nabla u \\nabla v~dx = \\sum_i a(u_{|\\Omega_i}, v_{|\\Omega_i})\n",
    "$$\n",
    "\n",
    "When we write down BLFs and LFs for distributed FESpace, we actually simply write down\n",
    "it's local contributions. \n",
    "\n",
    "The FESpace figures out how to stick them together to form global forms. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%px\n",
    "u,v = fes.TnT()\n",
    "a = BilinearForm(fes)\n",
    "a += SymbolicBFI(grad(u)*grad(v))\n",
    "a.Assemble()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us see what we get after assembling the bilinear form:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a.mat is a <class 'ngsolve.la.ParallelMatrix'>\n"
     ]
    }
   ],
   "source": [
    "%%px --target=1\n",
    "print('a.mat is a', type(a.mat))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Parallel Matrices and Vectors\n",
    "\n",
    "The general principle for distributed linear algebra objects is:\n",
    "\n",
    "*Parallel Object = Local Object + ParallelDofs*\n",
    "\n",
    "### Matrices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[stdout:1] \n",
      "a.mat.local_mat on rank 2 is a <class 'ngsolve.la.SparseMatrixd'> of dimensions 19 19\n",
      "lcoal fes ndof:  19\n",
      "a.mat.row_pardofs:  <ngsolve.la.ParallelDofs object at 0x2b5807f49c38>\n",
      "a.mat.col_pardofs:  <ngsolve.la.ParallelDofs object at 0x2b5807f49c38>\n",
      "fes pardofs:        <ngsolve.la.ParallelDofs object at 0x2b5807f49c38>\n",
      "[stdout:2] \n",
      "a.mat.local_mat on rank 15 is a <class 'ngsolve.la.SparseMatrixd'> of dimensions 25 25\n",
      "lcoal fes ndof:  25\n",
      "a.mat.row_pardofs:  <ngsolve.la.ParallelDofs object at 0x2b824beacc00>\n",
      "a.mat.col_pardofs:  <ngsolve.la.ParallelDofs object at 0x2b824beacc00>\n",
      "fes pardofs:        <ngsolve.la.ParallelDofs object at 0x2b824beacc00>\n"
     ]
    }
   ],
   "source": [
    "%%px --target=1,2\n",
    "print('a.mat.local_mat on rank', comm.rank, 'is a', type(a.mat.local_mat), 'of dimensions', a.mat.local_mat.height, a.mat.local_mat.width)\n",
    "print('lcoal fes ndof: ', fes.ndof)\n",
    "print('a.mat.row_pardofs: ', a.mat.row_pardofs)\n",
    "print('a.mat.col_pardofs: ', a.mat.col_pardofs)\n",
    "print('fes pardofs:       ', fes.ParallelDofs())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each rank assembles it's local contribution to the global bilinear form into a sparse matrix, with dimensions matching that of the *local* FESpace!\n",
    "\n",
    "Let us assume we have some global numbering, and assume that $I_k$ is the set of indices corresponding to DOFs\n",
    "on rank $k$. \n",
    "\n",
    "The ebmedding matrices $E_k\\in\\mathbb{R}^{n_i\\times n}$ take local vectors of dimension $n_k$ and gives us global vectors of dimension $n$ .\n",
    "\n",
    "The global matrix $A$, operating on vectors of dimension $n$, can be assembled from the local matrices in the same way\n",
    "we usually assemble our FEM matrices from element matrices:\n",
    "\n",
    "$$\n",
    "A = \\sum_i E_i A^{(i)} E_i^T\n",
    "$$\n",
    "\n",
    "Importantly, the local matrices are **not** simply diagonal blocks of the global matrix,  $A^i$ only has partial values for DOFs that are shared with another rank, $A^{(i)} \\neq E_i^T A E_i$.\n",
    "\n",
    "![mat_distr](mat_distr.png)\n",
    "\n",
    "\n",
    "#### Note\n",
    "**We never globally assemble** $A$**!!**\n",
    "\n",
    "A common approach used by other software packages is to actually assemble $A$ and distribute it by rows.\n",
    "\n",
    "### Vectors\n",
    "\n",
    "Things look very similar for parallel vectors, they are again implemented as short, local vectors that\n",
    "make up the global one:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%px \n",
    "f = LinearForm(fes)\n",
    "f += SymbolicLFI(x*y*v)\n",
    "f.Assemble()\n",
    "gfu = GridFunction(fes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "length of vector:     19\n",
      "length of local vec:  19\n",
      "dim local fes:        19\n",
      "dim global fes:       283\n"
     ]
    }
   ],
   "source": [
    "%%px --target 1\n",
    "print('length of vector:    ', len(gfu.vec))\n",
    "print('length of local vec: ', len(gfu.vec.local_vec))\n",
    "print('dim local fes:       ', fes.ndof)\n",
    "print('dim global fes:      ', fes.ndofglobal)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Parallel Vectors additionally have a \"ParallelStatus\", which can be:\n",
    "\n",
    "- **Cumulated**, whenhe local vectors $v^i$ are just restrictions of the global vector $v$:\n",
    "\n",
    "$$\n",
    "v^{(i)} = E_i^T v\n",
    "$$\n",
    "\n",
    "- **Distributed**, when, similarly to parallel matrices, the global vector is the sum of local contributions\n",
    "\n",
    "$$\n",
    "v = \\sum_i E_iv^{(i)}\n",
    "$$\n",
    "\n",
    "The vector of the linear form $f$ is a collection of locally assembled vectors, so it is distributed.\n",
    "\n",
    "The vector of the GridFunction gfu has been initialized with zeros, so it has consistent values, it is cumulated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "status f vec:          PARALLEL_STATUS.DISTRIBUTED\n",
      "status vec.local_vec:  PARALLEL_STATUS.NOT_PARALLEL\n",
      "\n",
      "status gfu vec:        PARALLEL_STATUS.CUMULATED\n",
      "status vec.local_vec:  PARALLEL_STATUS.NOT_PARALLEL\n"
     ]
    }
   ],
   "source": [
    "%%px --target 1\n",
    "print('status f vec:         ', f.vec.GetParallelStatus())\n",
    "print('status vec.local_vec: ', f.vec.local_vec.GetParallelStatus())\n",
    "print('')\n",
    "print('status gfu vec:       ', gfu.vec.GetParallelStatus())\n",
    "print('status vec.local_vec: ', gfu.vec.local_vec.GetParallelStatus())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Multiplication of a parallel matrix with a cumulated vector gives a distributed one:*\n",
    "\n",
    "$$\n",
    "w = A v = (\\sum_i E_i A^{(i)} E_i^T) v = \\sum_i E_i A^{(i)} E_i^Tv = \\sum_i E_i A^{(i)}v^{(i)} = \\sum_i E_i w^{(i)}\n",
    "$$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%px\n",
    "v = gfu.vec.CreateVector()\n",
    "w = gfu.vec.CreateVector()\n",
    "v[:] = 1.0\n",
    "w.data = a.mat * v"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "status v:  PARALLEL_STATUS.CUMULATED\n",
      "status w:  PARALLEL_STATUS.DISTRIBUTED\n"
     ]
    }
   ],
   "source": [
    "%%px --target 1\n",
    "print('status v: ', v.GetParallelStatus())\n",
    "print('status w: ', w.GetParallelStatus())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Solvers and Preconditioners with MPI\n",
    "\n",
    "Not all solvers and preconditioners included in NGSolve also work with MPI, but many do:\n",
    "### Direct Solvers\n",
    "- masterinverse: Collect the entire matrix on the \"master\" and invert sequentially there.\n",
    "- MUMPS inverse: Distributed parallel inverse, scalability is limited.\n",
    "\n",
    "### Preconditioners\n",
    "- BDDC\n",
    "- 'hypre': Boomer AMG\n",
    "- 'hypre_ams': Auxiliary Maxwell Space AMG \n",
    "- 'local'\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%px\n",
    "c = Preconditioner(a, 'hypre')\n",
    "#c = Preconditioner(a, 'bddc', inverse='mumps')\n",
    "a.Assemble()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[stdout:16] \n",
      "it =  0  err =  0.1099777875557949\n",
      "it =  1  err =  0.04115832958100785\n",
      "it =  2  err =  0.01640413427737909\n",
      "it =  3  err =  0.002399109415917421\n",
      "it =  4  err =  0.002415741796640736\n",
      "it =  5  err =  0.007649635824685178\n",
      "it =  6  err =  0.005797037984086125\n",
      "it =  7  err =  0.0018076379420930557\n",
      "it =  8  err =  0.001549550034376465\n",
      "it =  9  err =  0.003298447178932686\n",
      "it =  10  err =  0.0016507909555561475\n",
      "it =  11  err =  0.0006339283042538051\n",
      "it =  12  err =  0.0002833425899718731\n",
      "it =  13  err =  0.00010532173704651773\n",
      "it =  14  err =  3.2319246923796026e-05\n",
      "it =  15  err =  1.0179181507465682e-05\n",
      "it =  16  err =  3.2057362836107735e-06\n",
      "it =  17  err =  9.871239271369994e-07\n",
      "it =  18  err =  2.7671426326565997e-07\n"
     ]
    }
   ],
   "source": [
    "%%px\n",
    "gfu.vec.data = solvers.CG(mat=a.mat, pre=c.mat, rhs=f.vec, tol=1e-6, maxsteps=30, printrates=comm.rank==0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "stop_cluster()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
